A Computational Analysis of Information Structure Using Parallel Expository Texts in English and Japanese

نویسندگان

  • Nobo N. Komagata
  • Seth Kulick
  • Rashmi Prasad
  • Anoop Sarkar
  • Sadao Kurohashi
  • Kathy McKeown
  • K. Vijay-Shanker
  • Naoya Arakawa
  • Akira Nagasawa
چکیده

A COMPUTATIONAL ANALYSIS OF INFORMATION STRUCTURE USING PARALLEL EXPOSITORY TEXTS IN ENGLISH AND JAPANESE Nobo N. Komagata Supervisor: Dr. Mark J. Steedman This thesis concerns the notion of ‘information structure’: informally, organization of information in an utterance with respect to the context. Information structure has been recognized as a critical element in a number of computer applications: e.g., selection of contextually appropriate forms in machine translation and speech generation, and analysis of text readability in computer-assisted writing systems. One of the problems involved in these applications is how to identify information structure in extended texts. This problem is often ignored, assumed to be trivial, or reduced to a sub-problem that does not correspond to the complexity of realistic texts. A handful of computational proposals face the problem directly, but they are generally limited in coverage and all suffer from lack of evaluation. To fully demonstrate the usefulness of information structure, it is essential to apply a theory of information structure to the identification problem and to provide an evaluation method. This thesis adopts a classic theory of information structure as binomial partition between theme and rheme, and captures the property of theme as a requirement of the contextual-link status. The notion of ‘contextual link’ is further specified in terms of discourse status, domain-specific knowledge, and linguistic marking. The relation between theme and rheme is identified as the semantic composition of the two, and linked to surface syntactic structure using Combinatory

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cohesive Readability of Expository Texts and Reading Comprehension Performance: Iranian EFL students of Different Proficiency Levels in Focus

Abstract The present study is an attempt to investigate the relationship between cohesive readability of expository texts and reading comprehension in EFL students with different proficiency levels. One hundred students formed the participant of this study. They were undergraduate students majoring in English at University of Isfahan. To collect the relevant data, participants were divide...

متن کامل

Cohesive Readability of Expository Texts and Reading Comprehension Performance: Iranian EFL students of Different Proficiency Levels in Focus

Abstract The present study is an attempt to investigate the relationship between cohesive readability of expository texts and reading comprehension in EFL students with different proficiency levels. One hundred students formed the participant of this study. They were undergraduate students majoring in English at University of Isfahan. To collect the relevant data, participants were divide...

متن کامل

Identifying Information Structure in Expository Texts

Identification of ‘information structure’ (organization of information within an utterance) is crucial for processing contextually-appropriate linguistic forms in many NLP systems. This paper presents a theory and a procedure to identify information structure in expository texts in English, and also an evaluation method that is used to validate the results of the identification process for real...

متن کامل

Lexical Cohesion in English and Persian Abstracts

This study compares and contrasts lexical cohesion in English and Persian abstracts of Iranian medical students’ theses to appreciate textualization processes in the two languages. For this purpose, one hundred English and Persian abstracts were selected randomly and analyzed based on Seddigh and Yarmohamadi’s (1996) lexical cohesion framework, a version of Halliday and Hasan’s (1976) and Halli...

متن کامل

Fairy Tales and ESL Texts: An Analysis of Linguistic Features Using the Gramulator

This chapter describes a study that investigates the potential value of using traditional fairy tales as reading material for English language learners (ELL). Using the computational textual analysis software, the Gramulator, the authors analyzed the linguistic features of fairy tales relative to a corpus of ELL reading material and a corpus of baseline educational texts for native English spea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999